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ABSTRACT 

Collaborative edition is achieved by distinct sites that work independently on (a copy of) a shared document. 
Conflicts may arise during this process and must be solved by the collaborative editor. In pure Peer to Peer col- 
laborative editing, no centralization nor locks nor time-stamps are used which make conflict resolution difficult. 
We propose an algorithm which relies on the notion or semantics dependence and avoids the need of any integra- 
tion transformation to solve conflicts. Furthermore, it doesn't use any history file recording operations performed 
since starting the edition process. We show how to define editing operations for semi-structured documents i.e. 
XML-like trees, that are enriched with informations derived for free from the editing process. Then we define 
the semantics dependence relation required by the algorithm and we present preliminary results obtained by a 
prototype implementation. 

1 Introduction 

Collaborative edition becomes more and more popular (writing article with S VN, setting appointments with doodle, 
Wikipedia articles,. . . ) and it is achieved by distinct sites that work independently on (a copy of) a shared document. 
Several systems have been designed to achieved this task but most of them use centralization and locks or weak 
centralization via time-stamps. A alternative approach is the Peer to Peer approach -P2P in short- where new sites 
can freely join the process and no central site is required to coordinate the work. This solution is more secure 
and scalable since the lack of central site prevents from failures and allows for a huge number of participants. In 
this paper we focus on editing semi-structured documents, called XML trees from now on, using the basic editing 
operations add, delete for edges or changing labels in the document. Since the process is concurrent, conflicts can 
occur: for instance a site s\ changes the label Introduction of an edge by Definition when another site S2 want to 
relabel Introduction by Abstract. Then s\ informs 52 of the operation performed and conversely. Executing the 
corresponding operations leads to an incoherent state since the sites nor longer have identical copies of the shared 
document. In the optimistic P2P approach, each operation is accounting for and conflicts are solved by replacing 
the execution of an operation opi performed concurrently with op\ by IT(op2,op\) where IT is an integration 
transformation defined on the set of operations. This transformation computes the effect of the execution of op\ on 
op2, i.e. the dependence of op2from op\. 

In the word case, the transformations proposed in |[T2l [3] [U [10] [T3) turned out to be non-convergent, see 
for counter-examples. In particular, none of these transformations satisfy both properties TP\ (a local confluence 
property) and TP2 (integration stability) that are sufficient to ensure convergence lfl2l . Currently, no convergent 



algorithm based on the integration transformation is known for words. For XML trees, algorithms and operations 
have been proposed (like in JTJ), but they have the same problem as in the word case or use time-stamps (see ifTTll ') 
i.e. are not true P2P. 

We propose a new algorithm that relies on semantic dependence of operations which allows to reduce the 
integration transformation to a trivial one: IT [op2,op\) — opj_. This is possible since we enrich the data structure 
by adding informations coming for free from the editing process on trees yielding an important property: each edge 
is uniquely labelled. Furthermore labels also record the level of dependence of the sites that created or modified 
them. These properties allow to get a simple convergent editing algorithm which doesn't require any history file 
recording all operations done since the beginning of the edition process. Since a word can be encoded as a tree, 
this algorithm also solves the word case, at the price of a more complex representation. These ideas have been 
implemented in a prototype that proved that the editing is done efficiently and that the process is scalable. 

Section [2] discusses the current approaches to collaborative editing, and we present our editing algorithm in 
section [5] The data structure used for XML trees is described in section 2] and our first results are given in section 
[5] Missing proofs can be found in the full research report. 

2 Related Works 

Many collaborative edition framework have been proposed, and we discuss only the most prominent ones. 

Document synchronization framework. IceCube (see 0) is a operational-based generic approach for reconcil- 
iating divergent copies. Conflicts are solved on a selected site using optimization techniques relying on semantic 
static constraints (generated by document rules) and dynamic (generated by the current state of the document). 
Complexity is NP-hard and this approach is not a true P2P solution (each conflict is solved by one site). The 
Harmony project |4| is a state-based generic framework for merging two divergent copies of documents. These 
documents are tree-like data structure similar to the unordered trees that we discuss in section|4] The synchroniza- 
tion process exploits XML-schema information and is proved terminating and convergent for two sites. 

Integration transformation based framework. S06 ifTTI is a generic framework based on the Soct4 algorithm 
which requires the local confluence property (TP1). It relies on continuous global order information delivered by a 
times-tamper, which is not pure P2P since it relies on a central server for delivering these time-stamps. The Goto 
system (Sun et al. fBl ). or SDT (Du Li and Rui Li J2]) rely on forward and backward transformation (for undoing 
operations). These algorithms need to reorder the history of operations which involve a lot of computations to 
update the current state in order to ensure convergence. 

Goto (Sun et al. OH), Adopted (Ressel et al. H3) and SDT (Du Li and Rui Li 0) rely on the local confluence 
property (TPl) and on the integration stability property (TP2) to guarantee convergence. A main issue is to ensure 
that operation integration takes place in the same context and return the same result and each algorithm has its own 
solution. For instance, Goto uses a forward (IT) and a backward ET) transformation to reorder the history (record 
of all operations performed). Adopted computes the sequence of integrations as a path in a multi-dimensional 
cube. The main drawback of these approach is that it is hard to design set of useful operations and integration 
transformations that satisfy both TPl and TPl. For instance, no such set exists in the word case nor for linearly 
ordered structures. 

The set of operations given by Davis and Sun provides operations on trees for the Grove editor fT), but this set 
doesn't satisfy the local confluence property TPl. Therefore, there is little hope to get a convergent editing process. 
OpTree ||5] present a framework for editing trees and graphical documents using Opt or the Soct2, and relies 
extensively on history files containing all operations performed on the date. The complexity is at least quadratic in 
the size of the log file and no formal proof of correctness is given. 

A main problem of all these solutions -even when convergence is guaranteed- is that they rely on manipulation 
of history files that records all operations performed and these computations can become quite expensive. 
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3 Conflict-free Solution 



We propose a generic schema for collaborative editing which avoid the pitfalls of previous works by avoiding the 
need to solve conflicts. First we give an abstract presentation of this editing process and of the properties required 
to ensure its correctness, then we show how it works for XML trees. 

Each site participating to the editing process executes the same algorithm (given in figure [TJ) and performs 
operations on his copy of the shared documents. Operations belong to a set of operations Op , and we assume 
that there is a partial order y s (i.e. an irrefiexive, antisymmetric, transitive relation) on operations and we write 
opi || v op2 iff opi ^ s op2 and op2 )/- s °P\- This ordering expresses causal dependencies of the editing process: 
op\ y s op2 iff op2 depends from op\ (for instance op\ creates an edge and op2 relabels this edge). In our model 
the set OpDep as op S Op,\/op' G OpDep\op y s op' is bounded set. We show how to compute this relation for 
XML trees in section FOI A sequence of operations is denoted by [opi; . . . ;op n ] and the result of applying opi, 
followed by op2, op n to the document t is denoted by [opi; . . . ;op n ](t). The set of operations (Op ,y s ) is 
independent iff V 'op, op' £ Op Vt,op\\ s op' =>■ [op,op'](t) = [op' ,op](t). 

A sequence [opi; . . .;op n ] is valid if for all opt, op ,• occurring in the sequence, opt y s opj implies i < j. In other 
words, the sequence is a linearization of the partial order defined by y s on the set {opi ,op n }. Given a valid 
sequence [opi;...;op n ], a substitution a of {l,...,n} is compliant with y s iff the sequence [op a ^y, . . .;op ^ n y] 
is valid. This yields that opi \ s opj iff op a /n \\ s op c n\ or in other terms, a doesn't change the causality relation 
between operations. The collaborative editing algorithm that we propose relies on the following propositiorQ: 

Proposition 1 Let (Op , y s ) an independent set of operations. Let [opi , ... , op n ] be a valid sequence of operations 

in Op and let O be a substitution compliant with y s . Then [opi, . . . ,op„](t) = [0/Wi),...,0/W B )](i) 

PROOF. Firstly, we prove that exchanging two consecutive non-dependent operations doesn't change the result. 

Let X; the substitution such that Tj(z') = i+ l,X/(i+ 1) = i and = k otherwise. Let [op\\ . . .;op n } be a valid 
sequence and let opi || opi + \. We prove that [op\, . . . ;op„](t) = [op x .ny,. . . ; op z j n ) (t)\ as follows: 
[opx,(i) ; • ■ ■ ; op %i ( n y)(t) = [opi; ... ; opi-i; op i+ \; ope op i+2 op„] (t) 

= [op i+ i;opi;opi + 2 ■ ■ .,op n ](t') with s' = [opi;. . .,opt-i](t) 
= [opt;opi + \;opi + 2 . . . ,op n ](t') since (Op , y s ) is independent 
= [opi,---,op n ](t) 

Secondly we prove the result by induction on the number of elements in the sequence [op\; . . .;op n ]. 

• Base case: n = 1 straightforward. 

• Induction case: Let [opi; . . .;op„] be a valid sequence of Op . 
Let [op a m; . . ■ ;op a (n)] be another linearization of {opi, . . .,op„}. 
We prove that [opi;...;op„](t) = [op aW ; . . . ;op a{n) ](t). 

By definition op\ is a maximal element of >- s . This element occurs at position j in / = [op a ny, . . . ;op Q i n y](t). 
Let X/c be the subtitution that exchanges the elements of / at positions k and k + 1 and leaves other elements 
unchanged. 

Since op\ is maximal, any operation op' occurring in I at position k < j is such that op' || op. 

Therefore there is a sequence T;-i, . ..,Ti of substitutions such that the application of these substitutions to 
[op a {i)\- ■ ■ \opa{ri)\ yields a sequence [opi;op' 2 ; . . .;op' n ] such that (i) [opi;op' 2 ; . . .;op' n ](t) = [op aW ;. . .;op a[n] ](t) 
(by our first result) and (ii) [opi;op' 2 ; ■ ■ ■ ;op'„] is a linearization of opi, . . .;op n . 

Therefore [op' 2 ; . . .;op' n ] is a linearization of op2, ... ,op n . 

By induction hypothesis, we get [op' 2 ; ■ ■ .;op' n ](t') = [op2, ■ ■ .,op n ](t'). 

Taking s' = [opi](t) yields the result. 

1 This result is a classical result in the field of partial order 
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Another statement of the proposition is that the execution of any linearization of a partial order on some initial 
value yields the same result. 

The dependenceOf function. In our setting, operations are issued by sites and are numbered with an operation 
number on this site. For instance, to delete a node in a tree, the operation is defined by the action delete, the site 
identifier Siteld of the site which issues this deletion and the operation number OpCount on this site. Furthermore, 
the data structure (the shared document) is build using these operations and stores this information for each com- 
ponent (nodes or edges for trees for instance). A request r is a triple composed of an operation op, a site identifier 
Siteld, and an operation number OpCount. We assume that there is an function dependenceOf (r) which returns 
for each request r, the pair (Siteld' : OpCount') of any operation op' such that op' y s op. Actually, this operation 
can return such pairs only for the minimal (ofr y s ) operations op' such that op' y s op. In section ??, we show how 
to define effectively and in a simple way this function for XML trees. 

The (Fast Collaborative Editing) FCeditAlgorithm. The procedures (except Main( )) of the generic distributed 
algorithm FCedit are given in figure Q] Each site has an unique identification stored in Siteld, a operation num- 
bering stored in Opcount, a copy of the document t and a list WaitingList of requests awaiting to be treated. The 
function dependenceOffr) with r = (op, Siteld : OpCount) returns the pairs (nSite : cSite) with nSite a site iden- 
tifier, cSite some operation count, such that op depends from an operation issued from site nSite with operation 
count cSite. This function is defined simultaneously with the data structure, set of operations and dependence re- 
lation, see section 23] for the definition used for XML-trees. The Main() procedure (not given in figure [TJ calls 
Initializef) and enters a loop which terminates when the editing process stops. In the loop, the algorithm choose 
non-deterministically to set the variable op to some user's input and to execute GenerateRequest(op) or to execute 
Receive(r). GenerateRequest(op) simply updates the local variables and broadcast the corresponding request to 
other sites. Receive(r) adds r to WaitingList and executes all operations of requests that becomes executable thanks 
to r (relying on Execute and IsExecutable). 

The convergence property states that each site has the same copy t of the shared document after all operations 
have been received and executed by each site. Firstly, we show that requests are executed in a sequence that respects 
the dependence relation. 

Proposition 2 Let op\,. . . ,op s n be the sequence of operations generated by site s using GenerateRequest . Then 
the operation count associated to op* is i and op* y s op S j implies i < j . 

PROOF. The first fact is obvious since OpCount is incremented by 1 at each creation of an executable request, 
starting from 0. Line 6 to 9 of isExecutable(r=(op,#Site,#Op)) tests that each operation op', issued by site nSite 
with operation number cSite, which is dependent of op contained in r has been executed. This is ensured by 
returning false if SReceived[nSite] < cSite. □ 

Proposition 3 Let s,s' be two distinct sites. Let op\, . . . ,op s n be the sequence of operations generated by s using 
GenerateRequest. Let op\ op m be the sequence of operations executed by s' using GenerateRequest or Receive. 
If op S j. is the execution ofopf (from s) by s' then the sequence opj , . . . iOp , satisfies y'i < '}% < . . . < j„ (i.e. the 
execution order on s' respects the creation order on s, hence the dependence relation). 

Proof. Before any execution of an operation (line 6 of GenerateRequest or line 5 of Receive) a call to isExecutable 
is performed. The first step of this function returns false for an operation of site s numbered n if the operation of 
site s numbered n — 1 has not been executed. Therefore the execution order of the operations op] respects their 
creation order. Since the creation order respects the dependence relation, we are done. □ 

Proposition 4 The algorithm FCedit is convergent if the set of operations is independent. 
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1 InitializeQ: 

2 begin 

3 Vf' , SReceived [i] = // State Vector of received 
operations 

4 (Siteld. Ob j, OpCount. WaitingList) = (n,o, 1,{}) 

5 end 

1 GENERATEREQUEST(op): // User emit operation 

2 begin 

3 Let r = (op,SiteId : OpCount) 

4 if isE.xecutable(r) then 

5 OpCount = OpCount + 1 

6 t = op(t) II Apply operation 

7 broadcast r to other participant. 

8 end 

1 RECEIVE(r): // This function is executed when a 
request is received 

2 begin 

3 WaitingList = WaitingList U r 

4 forall r £WaitingList\isExecutable{r) do 

5 execute[r). II execute all executable 
request 

6 end 



1 ISEXECUTABLE(r): // Check that request r is 
executable 

2 begin 

3 Let r=(op,#Site :#Op) 

II Check that the previous operation on 
same site has been executed 

4 if #Site Siteld A SReceived[#Site] j= #Op - 1 then 

5 return false 

// Check all dependencies was executed 

6 for (nSite : cSite) £ dependancesOf(r) do 

7 if SReceived\nSite] < cSite then 

8 return false 

9 return true 
10 end 

1 EXECUTE(r): // Execute a request r 

2 begin 

3 r=(op,#Site:#Op) 

4 StateReceived[#Site]=#Op II Update state 
vector 

5 WaitingList = WaitingList /r II remove r from 
waiting list 

6 t = op(t) II Applies a operation 

7 end 



Figure 1 : The Concurrent Editing Algorithm 



PROOF. Let [opi; . . .;op m ] by the sequence executed on site s. We prove that [opi; . . .;op,„] is a linearization of 
the partial order defined by y s on {opi ,op m }. 

Let opi and op ,• such that opi and opj have been generated by the same site s'. The subsequence [op 71 ; . . . ; opj t ] 
corresponding to the operations received from site s' is such that op Jk y s opj k , implies jk < (by proposition^. 

Let opi and opj such that op, has been generated by s' and opj has been generated by s" . If op, y s opj, the 
function isExecutable called on the request r = {opt,...) before executing r on site s checks that opi has been 
executed on site s (line 6 to 9 of isExecutable). Therefore we get that i < j. 

Therefore [opi;. . . ;op m ] is a linearization of the partial order induced by >- s on {opi , . . . ,op,„}. Since each site 
executes a linearization of the same partial order, proposition Q] yields that each site computes the same value for 
the shared document. 

□ 



4 Conflict free operations for XML Trees 

The basics editing operations on trees are insertion, deletion or relabeling of a node. Actually, since we consider 
edge labelled trees instead of node labelled trees, insertion and deletion are performed on edges instead of nodes. 
Firstly, we consider unordered trees, and we show in section 14.41 how to reestablish the ordering between edges, 
which allows to get a data-structure corresponding to XML trees. 
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4.1 Data Structure 



The information stored in nodes (or edges in our case) can be described as a word on some finite alphabet E. To 
get a independent set of operations containing relabeling, we must have a much more complex labeling that we 
describe now. 

The set of identifiers ID. Each site is uniquely designated by its identifier which is a natural number (IP 
numbers could be used as well). The set of identifier is the set ID of pairs ((SiteNumber : NbOpns)) where 
NbOpns 6 Nat is denotes some numbering of operations on this site. 

The set of labels L. A label is a pair (/, id) where id s ID and / is a triple (lab, id' ,dep) with lab <E with Ej, 
a finite alphabet, id' £ ID, dep <E 3\£ (expressing a level of dependence). 

Trees. Trees are defined by the grammar 

T 3 t ::= { } | {n\(t\),.. .,n p (t p )} where n, = (li,idj) 6 L,f ; e T 

where each id\ occurs once in t. 

The uniqueness of labels is guaranteed by the fact that idi = ((SiteNumber : NbOpns)) states that the edge has 
been created by operation NbOpns of site SiteNumber. 

Trees are unordered i.e. {ni(t\), . . . ,n p (t p )} is identified with {« (i)( f a(i))j ■ • • , n a(p)( t a(p))} for any permutation 
of {1,. ..,«}. 

Example. We give an XML document and a tree that may represent this document as the result of some editing 
process. 



1 


<?xml version-" 1 . 0" encoding=" UTF -8" ?> 


2 


<Pat > 


3 


<Phone > 


4 


<Cellular > 


5 


0691543545 


6 


</Cellular > 


7 


<Home > 


8 


0491543545 


9 


</Home > 


10 


</Phone > 


11 


</Pat > 


12 


<Henri > 


13 


<Adress > 


14 


4 5 Emile Cap 1 ant Street 


15 


</ Adre s s > 


16 


</ Henri > 



Pat / \ Henri 

Phone Address 
Cellular F\ Home J 45 Emile Caplant Street 

0691543545 I } 0491543545 

(b) Schematic tree 



(a) XML Document 



Figure 2: Document 

f ((Pat (1-3)2) I ((Phone (3-4) 5) (2-1)) (I ((^ (3 : 2), 1)(3 : 1))({((0491543545, (4 : 2), 1), (4 : !))({})}) j\ j\ ) 

1 ^rat,(l.i),Z),(l.l))\^ ((rnone,(i.4),S),(2. I)) ^ ((Cellular, (5 : 2),3), (5 : 1))({((0691543545, (6 : 2), 1), (6 : 1))({})}) )) )) \ 

{ {(Henri, (2 : 3), 1), (2 : 2))({((Address, (3 : 5), 2), (3 : 2))({((45 Emile Caplant Street, (4 : 9), 5), (4 : 2))({})})}) J 

4.2 Editing Operations 

We extend the set by a symbol NoValue that states that a label is not yet set. 

Adding an edge. The operation Add(id„,id) with id p ^ id adds an edge labelled by (Z, id) with / = (NoValue, id, 0) 
under edge labelled (. . . , id p ). When id p doesn't occur, the tree is not modified. It is formally defined by: 
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Add(id p ,id){{ }) = {} 

Add(id p ,id)({ni(h),...,(li,idi)(ti),...,n p (t p )}) = {rti(fi), . . . ,{k,idi){U^ {{NoValue,id,0),id)({ })...n p (t p )} 

if id p = idi 

Add(id p ,id)({n\(ti), . . .,n p (t p )}) = {n\(Add(id pl id)(ti)) 1 . . . ,n p (Add(id p ,id)(t p ))} 

if Hi = (//, id,) with idi 7^ idp for i= 1 , . . . , n 
Deleting a subtree. The operation Del (id) deletes the whole subtree corresponding to the unique edge labelled 
by (. . . , id) (including this edge). When id doesn't occur, the tree is not modified. It is formally defined by: 

Del(id)({}) = {} 

Del(id)({n\(t\),...,{li,idi)(ti),...,n p (tp)}) = {m(t\ ),..., n/_i(f,-),n i+ i(f,- + i),...«p(f p )} 

if id = idi 

Del(id)({ ni ( tl ),. . .,n p (t p )}) = { ni {Del{id){h)),. . . ,n p {Del(id)(t p ))} 

if rii = (lj,idi) with idj ^ id for i = 1 , . . . , n 
Changing a label. ChLab(id e ,id op ,dep,L) with id e ,id op GlD,dep € 5\£ , L E El replaces the label (l e ,id e ) of 
the edge identified by (. . . , id e ) by (L, id op ,v) depending on some relations on dependencies. It is defined formally 
by: 



ChLab(id e ,id op ,dep,L)({ni(ti),...(l e ,id e )(t e ),...n p (t p )})) = {m(ti),...(l' e ,id e )(t e ),...,n p (t p ) 

(L,id op ,dep), if dep e > dep or else dep = dep e and id op < idm 
l e , otherwise 

ChLab(id e ,id op ,dep,L)({ni(ti), ...,n p (t p )})) = ({n\(ChLab(id e ,id op ,dep,L)(t\)) . . .n p (ChLab(id e ,id op ,dep,L)(t p ))}) 
if = (k,idi) with idi 7^ id e for i = l,...,p 



where l e = (L e ,id e ,dep e ) and l' e 



4.3 Semantic Dependence 

Let the set of operations be Op = {Add(id,id'),Del(id),ChLab(id,id / ,dep,L) \id,id ! £ ID, dep £ 9{.,L G E^}. 
The dependence relation y s is defined as follows: 

• Add(id,id p ) y s Del (id): an edge can be deleted only if it has been created. 

• Add(id p ,id p ) y s Add(id,id p ): adding edge id under edge id p requires that edge id p has been created. 

• Add(id,id p ) y s ChLab(id ,id op ,dep ,L): changing the labeling of edge id requires that edge id has been 
created. 

This allows to compute the set of identifiers depending from an operation: 

{id p for op = Add (id p , id) 
id for op = Del(id) 
id for op — ChLab(id, id up ,depLvl,lbl) 

Proposition 5 The set (Op , >-j) is an independent set of operations. 

Proof. We prove that if op\ \ s op2 then [op\ , opz] (t) = [op2 1 op\\(t) by a case analysis on all possible pairs 
opi,op 2 . 

1. opi = Add(id\,id Pl ) 

(a) op2=Add(id2,id P2 ) 

• id px = id 2 or id P2 = idi there for respectively op\ y s op2 or op2 y s °Pi- 

• else we can insert a edge before another independently of order the result will be same as a set. 

(b) op2 = Del(id2) 
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• idj = id pi or id P1 is in subtree id^. 

let t a tree. t\ = Del{id2){t) by definition id p is deleted. 

Add(id\ , id pl ) =t\. t% = Add(id2,id pi )(t) and Del{id2){t<i) = t\ because a subtree are erased. 

• id2 = id\ \ because Add (id\ , id p j ) >- s del (idi ) . 

• other : the edge id\ has been created and id2 has been deleted whatever order, 
(c) op2 =ChLabel(id2,id op 2,dep2,lbl2) 

• id.2 = id\ : the edge be created before renamed because Add(id\,id pi ) )~ s ChLabel(idi,id op 2,dep2,lbl2). 

• other, the add have no effect on ChLabel and vice versa. o 

op\ = Del(idi) 

(a) op2 — Add(id2,id P2 ) : It'sfTblcase. 

(b) op2 =Del(id2) If id\ is a subtree id2 then \del(id\) 1 del(id2)]{t) there are no edge to delete with del(id\) 
because it was deleted with del{id2) ■ And \del{id2) ,del{id\)\{f) the the edge and subedge of id\ were 
deleted at first time and with id\ was deleted too. else two subtree are distinct . 

(c) op2 = ChLabel (id2,id op 2,dep2,lbl2) 

• id\ = id2 

Let t' = del(idi)(t). Chlabel(idi,id op 2,dep2,lbl2)(t r ) = t' because id\ is not present in t' . 
del(id\)(Chlabel(id\,id op 2,dep2,lbl2){t)) = t' because id\ and it subtree was deleted. Whatever 
her label. 

• Other : there are no problems. 

o 

op\ = Chlabel{id\ , id opi ,dep\, lbl\) 

(a) op2 — Add(id2,id P2 ) '■ It'sfTclcase. 

(b) op2 = Del(id2) '■ It'sl2clcase. 

(c) op2 =ChLabel(id2,id op 2,dep2,lbl2) : 

• id\ id2'. The edge be different. 

• id\ = idi 

- dep\ < dep2 let t\ = op\ (o^tO) 
let f 2 = op 2 (opi(t))W 

In ( l ) the label of id\ is IM2 and not changed by op\ (definition), in the label of id\ is lbl\ 
and changed by op2 to WI2 (definition), 
therefore t\ =t%. 

- dep2 < dep\: idem with number of label are inverted. 

- depi = dep2 if id opi < id op 2 same of dep\ < dep2 
else same of dep2 < dep\ 

By definition id opi ^ id op 2 o 

□ 
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4.4 Ordered Trees 



The previous editing process is defined on unordered trees when XML documents are ordered trees. To make the 
algorithm work in this case, we enrich the labeling of edges with an ordering information. This shows that our 
approach works in this general case. The properties required on the ordering information are: 

• The ordering of labels must be a total order 

• The ordering is the same for each site 

• Insertion can be done between two consecutive edges, before the smallest edge and after the largest edge. 

The ordering that we design enjoys all these properties. To each edge corresponding to some identifier id we 
associate a word on some finite alphabet E such that two distinct edges corresponds to distinct words. 

Let Eo = { a 1 j ■ ■ • 7 a n } a finite alphabet such that there is a injective mapping (j) from ID into Eq. For instance, to 
a pair ((s :n)) with s a site number, n an operation number, we can associate a word dec(s) ■ dec(n) on the alphabet 
{0, 1, . . . ,9} U {•} with dec(x) the representation of x in base 10. 

We extend Eo by the letter # used as a separator and _L used as a minimal element, yielding a alphabet E. The 
ordering on letters is _L < # < a \ . . . < a„. The lexicographic ordering on words of E* induced by the ordering of 
letters is a total ordering. 

The labeling of an edge e corresponding to the identifier id e is enriched by a new field p e G (Eo U and 
we associate to e the word w e = p e #§(id e ). The #§(id e ) part is added to guarantee that distinct edges are associated 
to distinct words. 

Proposition 6 The ordering on edges defined by e -< e' iffw e = p e #§(id e ) <C w/ = pf#ty(idf) is a total ordering on 
edges. 

Proof. Since distinct edges have distinct identifier, the function (]) is injective and #§(id e ) is the smallest suffix of 
w e containing only one occurrence of #, then the words associated to distinct edges are distinct. This proves the 
proposition since <C is a total ordering on words. □ 

Example. Let e,f be edges identified by id e = (1, 10) and idf ~ (2, 1). Let §(id e ) = 1.10 and §(idf) = 2.1, 
Let the priority of e be 12 and the priority of / be 21 1. The ordering on digit is '/' <' / if i < j and . <' i'. Since 
1 1#1 . 10 <gC 211 #2 . 1 , we get that edge e precedes edge / in the tree. 

Let W be the set of words of the form w p #Wid with w p G E*, w (£ / G §(ID) C Eg. 

Proposition 7 Let w,w' G 'W such that w <C w'. 

( i) There exists a computable w" G W such that w <C w" and w" w'. 

(ii) There exists w m ,WM G W such that w m <C w and W <C %. 

Proof. Let s[k] denote the k th letter of a word s and let \s\ denote the length of the word s. 

(i) Let w = w p #Wj <C W = Wp/itw'i. We construct w" such that w <C w" <C w'. Let j be the minimal integer such 
that w[j] < w'[j\. 

Case 1. j < length(wp#w'i). Let w p n such that \w'p \ = |Wp#w-| and w'p[k] = w' p [k] for k = 1, ... J and w'p[k] = _L 
for j < k < length(wp). Given any w" — §(id) for some id, by construction the word w" = w" } #w'/ is 
such that w <C w" < w' . 

Case 2. j = length(W p #w'^) . Let w p n = w p #wt#. Given any w" = §(id) for some id, by construction the word 
w" = w'p#Wj is such that w <C w" < w'. 

(ii) Let w = w p #Wi < w' — Wptthv';. We construct w m such that w m -C w. Let w' p [k] = _L for i = 1, . . . , length(w p ) + 
1. Given any wf = §(id) for some id, by construction the word w m = w^#wf is such that w m < w. The same 
construction works to get wm such that w' <Si wm (use a„ instead of _L). 

□ 
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An updated set of operations. The data structure is slightly modified since the labels are now elements (I, id) 
with id G ID and / a tuple (lab, id' ,dep,p) £ T.^id' € ID,dep £ !>l,p £ W . The field p combined with the identifier 
id is used to order the edges arising from the same node, therefore the data structure is similar to semi-structured 
documents. 

The Add and ChLab operations must be slightly modified to handle the new field p, which simply amounts to 
considering a different set of labels. The set of dependence between operation is the same as before and we have: 

Proposition 8 The set (Op , >- s ) is an independent set of operations. 

Therefore our collaborative editing algorithms works for ordered trees, i.e. XML trees. 

5 Experiment and Future Works 

We have implemented the algorithm and the data structure for XML trees in java (including the ordering informa- 
tion) on a Mac with a 2.53GHz processor. 

The data structure tree is composed of edges. Each edge have the following fields : 

• a field for storing its identifier (which is unique). 

• a field for storing the sons (which are edges). 

• a field for storing its ancestor (which is an edge). 

A tree is identified as a some edge (the root). Access to an edge having some identifier is done using a hash- 
table with identifier as key. The initial document is composed by only one edge: the root with like identifier : 0. 
Applying an operation op on the tree is performed by the function do : Tree x Op i — > Tree. 

The implement of do is straightforward. For instance do(Add(idf, id), tree): 

(i) creates a new edge with identifier id. 

(ii) asks the hash-table to get the father edge idf 

(iii) stores the father reference. 

(iv) adds new edge into the father list. 

(v) adds new edge references in the hash-table. 

The P2P framework is simulated by random shuffling of the messages that are broadcast. The results obtained 
with our prototype are given in Figure [3] 

The reader can see that execution time is almost linear. Furthermore memory consumption (not shown here) is 
directly related to the size of the document (since we use no history file when for GOTO has a quadratic complexity). 

Future works: We plan to extend this word by adding type information like DTD or XML schemas which 
are used to ensure that XML documents comply with for general structure. The second main extension that we 
investigate is the ability to undo some operations, which may require a limited use of an history file to recover 
missing information (needed for instance to recover a deleted tree). 
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